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Abstract 

Computing the Newton step for a generic func¬ 
tion/ : —> R takes 0(N 3 ) flops. In this pa¬ 

per, we explore avenues for reducing this bound, 
when the computational structure of / is known 
beforehand. It is shown that the Newton step 
can be computed in time, linear in the size of the 
computational-graph, and cubic in its tree-width. 


1 Introduction 

Newton’s method forms the basis for many second-order 
methods in Nonlinear-optimization; it is also the core tech¬ 
nique used in Interior point methods. It’s applicability to 
large-scale programming, however, is often limited due to 
the run-time complexity in computing the Newton step. 

For a generic function / : R^ —» R, computing the Hes¬ 
sian requires atleast 0(N 2 ) flops; further inverting the ma¬ 
trix requires 0(iV 7 ) flops (7 = 3, in practice). This is 
computationally infeasible for many problems in practice. 

Often, however, one is also given access to the the compu¬ 
tational structure of the objective. The computer routine 
for calculating the objective /(•) can be represented as a 
Directed Acyclic Graph [DAG] mapping inputs to/(•) via 
intermediary nodes. 

For instance, the objective function for the canonical 
optimal-control problem is given by, 

n—1 

min J{u 0 ,... ,u n ) = V' k(xi,Ui) + ( n (x n ) , 
L z=0 J 

Vi, 27+1 «- f (xi,Ui), 

(1) 

where the dynamics and local-objectives of the system are 
given by f (•, •), and /](•,•) respectively. The infix operator 
’ indicates that the value appearing on the right-hand 
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side, is given the placeholder symbol present to its left; we 
explicitly distinguish this from the ’=’ operator, which is 
taken to represent a constraint. 


- >X 2 ->23 > 



Figure 1; Optimal control problem: The dynamical system 
states are represented by {.x,}, and the control by nodes 

{M- 

The order of computation for the objective 0 can be rep¬ 
resented by a linear-chain (Figure [TJ. Hacking constraints, 
the apparent sparsity in {I}, is entirely destroyed once all 
the placeholders are substituted for, 

3{uq, • ••, u n ) =k(x 0 ,u 0 )+ 

[i{f(x 0 ,u 0 ),ui)+ 

( 2 (f(f(x 0 ,U 0 ),Ui),U 2 ) + - 

The Hessian of j(-) thus being dense, implies a run-time 
that is cubic in the input dimensions for the Newton step 
computation; computing the Hessian itself is quadratic. 

By contrast, once the problem 0 is written in its con¬ 
strained form (by replacing ’ with ’=’), the sparsity of 
the resulting Karush-Kuhn-Tucker [KKT] system, readily 
allows for computing the SQP/Fagrange-Newton step in 
linear time ED- Such a transformation, however, comes at 
the cost of increasing the size of the optimization problem, 
abandoning state feasibility, and increased implementation 
complexity. 

The question which this paper answers, is whether there 
exist general techniques, which allow exploiting the spar¬ 
sity of the problem, while working solely with the input 
variables. Note that these are not questions merely about 
elimination orders, but are also verily algebraic in nature. 

Automatic Differentiation: Research on Automatic 
Differentiation [AD] has produced many techniques for ex¬ 
ploiting the computational structure of generic functions. 










They are routinely employed for efficient calculation of 
gradients and Hessian vector products 0 . The applicabil¬ 
ity of AD to second-order optimization is, however, quite 
limited. 

AD is typically used either for computing the entire Hes¬ 
sian matrix, or for calculating Hessian vector products for 
use in Nonlinear Conjugate Gradient Descent [CG], Hes¬ 
sians are computed, by accumulating one column at a time 
via calls to the Hessian vector product routine 0. The 
sparsity of the Hessian can be exploited in reducing the 
number of such calls 0 , but structured problems such as 
0 will not allow for any such economy. Compositional 
chains of functions, such as those in optimal control (|T| 
(Figure [TJ, not only serve to make Hessians dense but can 
also lead to condition numbers, exponential in their diam¬ 
eters. Large condition numbers are likely to negate any 
computational advantages offered by methods like CG. 

The above techniques form the Hessian matrix, directly 
or indirectly, before computing the Newton step. This 
stands in contrast with the root-finding problem, for which 
there do exist methods for directly computing the Newton- 
Raphson step 03 ) 0 0 - The root finding problem in¬ 
volves the inversion of the Jacobian of a function - rather 
than the Hessian - and these methods reduce this compu¬ 
tation to that of inverting a sparse matrix Ifl7l . The New¬ 
ton step (for optimization) can also be computed using this 
method by formulating it as a root-finding problem on the 
gradient. This, however, results in non-symmetric matrices 
depending on the computational graph of the gradient, as 
opposed to the function itself. The latter graph is transi¬ 
tively closed, and hence, analyses for the above Newton- 
Raphson AD algorithm apply to the gradient, and are diffi¬ 
cult to extend to the underlying objective J 6 ). 

Dynamic Programming: The question posed earlier, 
has already been answered in the affirmitive, for the op¬ 
timal control problem. There exists an algorithm for op¬ 
timal control, based on Dynamic Programming, that ex¬ 
ploits algebraic dependencies in ([TJ, in order to compute 
the Newton-step in only linear time (3} ifTOl iTHTI . 

The run-time of this algorithm is the direct result of 
the sparsity of the corresponding constrained problem Q 
mm. The band-structure of the relevant KKT system 
allows for solving the system in linear time fl9j . The rela¬ 
tionship between computing the Newton-step (Hessian of 
the objective), and computing the Lagrange-Newton step 
(Hessian of the Lagrangian), is established by noting that 
there exist multiplier values such that both compute the 
same result 0 . 

Such algorithms are routinely employed by practitioners 
for updating control policies in real-time, while maintain¬ 
ing a feasible trajectory. These algorithms have been ex¬ 
tended to Extended Kalman Filtering (EKF), as well as var¬ 
ious other formulations of the control problem na urn 


I2l|. 

Overview: We generalize such algorithms, by using Hes¬ 
sian vector product equations from AD, to relate the com¬ 
putation of Newton step and Lagrange-Newton step, for ar¬ 
bitrary structured objectives. 

We then extend this framework to structured optimization 
problems with equality constraints. 

Further, we show that solving the resultant KKT systems 
can be accomplished in time 0(tw 3 ), where ’tw’ is the 
tree-width of the canonical computational graph. 

Finally, we show results from numerical experiments. 

2 Notation 

Let Cl be a Directed Acyclic Graph [DAG], and let each 
vertex v £ V[G], be associated with state S v £ U v C R n ”, 
taking values in an open set. Denote by <5 1 ('/,>), the par¬ 
ents of v £ V[Q], and by d~(v) its children; let Sa be 
the (labelled) concatenation of states , associated with ver¬ 
tices in set A C V[G]- Define the set of input nodes 
X = {Xi,X 2 , ■ ■ ■, x n } = {v | <5+(u) =!,!)£ V[G]}, 
to be the parentless vertices of Q. 

An objective function / : U Xl x ... U Xn —t R, has the 
computational structure given by the tuple (Q, {<p v }, {4}), 
if it can be written as the sum of local objectives 4 : 

rLeMu<5+(G U * R ’ on the S ra P h 

f ■ (Sx! ) • • • ) ) '“A 'y ' 4 (SuUl5+(w)) ) 

v€V{g] (2) 

S„ <r- p v (S s+{v) ), Vv £ V[Q},5 + {v) ± 0. 

The state of a non-input node v £ V[G] in (|2ji, is defined 
recursively as S„ •<— tp u (S ( 5 +( t; )), for some given function 
Pv ■ ELe<5+(«) Uz ~> U v . It follows since Q is a DAG, 
that Sy[g] and hence/(•), is uniquely determined from the 
input Sx. and functions { <p v }. The order of computation 
for the objective is given by the topological ordering of G, 
and the DAG Q is called the computational graph of /(•). 
The computer routine for calculating any objective func¬ 
tion, can be represented by such a structure 0- 

In the following sections, the symbolism d u v is used as a 
shorthand for f§*|„ . The derivatives of functions with 

Oo u I Sx 

respect to S„ are similarly denoted by the operator o u \ that 
with respect to a (labelled) set A = {v\,V 2 : ■ ■ ■ } C V[G] 
by d A = [d Ol ,0 a2 ,...]. 

3 Newton step 

Consider the objective function in ([2]), defined by the tuple 
(G, {<Pv}, {4})- The optimization problem of interest is the 


following. 


min /= V 4( S„u 5 +(«)) 

o x , ,... ,o x 1 

\ ^ev[a] 

s« ^t,(S 5 +( v )), Vt; G U[£],<5 + (u) ± 0, 


(3) 


and the corresponding constrained problem is obtained by 
replacing the operator by ’=’ in <(3}. 


In the following, we consider first the constrained formu¬ 
lation of Q, and define the KKT system involved in com¬ 
puting the Lagrange-Newton step; we then relate these to 
computing the Newton step. 


Since Q is a DAG, there exist child-less nodes (i.e 5~{v) = 
0), from which the above recursion can be initialized. The 
recursion then proceeds backward in the depth first search 
order on Q. This algorithm is known as reverse-mode AD 

13. 

Hessian vector AD: A change in the inputs <5Sx. re¬ 
sults in the first-order change in the derivative, 5[d v f] = 
d\ v f ■ <5Sx, which is given by the Hessian vector product. 
Computing the Newton step is thus, equivalent to finding a 
<5 Sa' such that, S[dxf] = —dxf- 

Applying chain-rule over the DAG Q , for all terms in (jTJ, 
we obtain. 


3.1 Lagrange-Newton 


The Lagrangian for the constrained form of 0 - is given by, 

^(Sy[ 5 ], A) = 


E 


4>(SuU(5+(z’)) 


vG V[Q] 


'y ' X y h v (SijLJ(t+ (^) ); 
vev[g], 

5+(t>)^0 


where. 


W e V[G\,8 + {v) ± 0, h v (s vUS+(v) ) = <p v (Ss+w) ~ s v> 

(4) 

and the vector A is the labelled concatenation of all A,,’s. 


The necessary first order conditions for optimality of this 
problem are given by CD, 

dy£(S^,A*) =0, h(Sy) = 0. (5) 


Vu, S[d v f] = 


E E s 0 

sG , iiU(5*('h) VaGvU5+(s) 


E I S l d df] T d v d 

d£s~(v) y 


E (d d f d 2 av d) ■ 6S a ) 

aG< 5 +(d) 


Va, 5S a = ^2 9 d a- SS d . 

deS+(a) 


( 8 ) 


These equations can be solved, for a given SSx, by a 
forward-backward recursion similar to the one used for 
solving 0 0 . Computing the Hessian-vector product in 
this manner takes time 0(w(C?) 2 )[j]||9]], where u>(G) is the 
clique number of the moralization of Q. 


The Lagrange-Newton step for solving this system of equa¬ 
tions, around a nominal (Syr A), entails solving the follow¬ 
ing KKT system lfl2ll . 


' d 2 yC 

dyh T 

<5Sy 


—dyC 

dyh 

0 

<5A 


—h 


Sequential Quadratic Programming [SQP], involves taking 
a step along (5Sy,<5A) and iteratively solving for the first 
order conditions. In the following section, it will be shown 
that there exist values for Lagrange multipliers, depending 
only on the inputs, such that the solution to 0- yields the 
Newton step for the unconstrained objective. 


3.2 Unconstrained Newton 


We recollect certain defintions from AD, and then continue 
to present one of the central results of the paper. 


Reverse AD: The first derivatives of the objective /(•) 
can be calculated by applying the chain rule over Q , 


Vv, 


d v f = ^2 dy[s + E dd f T dyd; 

sGuU( 5 - ( , u) d(z 5 ~(v) 


v £ S + (d) => d v d = 


d<Pd(S s +( d )) 

dS~ v 


Newton step: The problem of interest is, however, the 
exact inverse: find a (5 Sa, such that 8[dxf\ = —dxf- This 
question is answered by the following theorem. 

Theorem 1 (Newton step) The Newton step for the objec¬ 
tive 0 is given by the Lagrange-Newton step 0 . when Sy 
is feasible and when \/v, X v = d v f as defined in Q. 

Proof The second equation in (| 8 ]> is equivalent to dyh ■ 
SSy = —h, in (| 6 j. Rearranging the first equation from (| 8 j, 
and setting 5\d v f] = —d v f for all inputs, we obtain Vu, 


0 = 


E d 2 v J s 5S a + J2 (d d f d 2 av d) 8S a + 

s£zvUS~ (v), d€S~(v), 

a(zvU 5 +(s) aG< 5 "*"(d) 



S+(v)^(b 

otherwise 


+ ^ (d v dys[d d f}. 

d(z 5 ~(v) 

(9) 


Similarly, expanding the top block in ([ 6 ]) using the defini- 


'We use Of) to hide factors linear in \E\ + |Vj. 


( 7 ) 










tions in (|3]» & ([ 6 |, we obtain Vv, 

- d v C = 


Y. dya^s <5S a + ^ (X T d dl v d) <5S a + 

( v ), d£S~(v), 

~ (s) aG<5"*"(<i) 

t K s ' f< ” ,540 )+ E («w) T «* 

0 otherwise / , N 

/ de8-(v) 


s£vU5~ (v), 
aEt;U<5"*" (s) 


( 10 ) 


"00 

where, 

d v C = 

2Js£vU6-(v) + Sdei5-( 1 ') ’ ® v< ^' S + (v) = 0 


SsguU5-(ii) ^44 + ~!2dES-(v) ^d ' &vd 


\y 


, otherwise 


( 11 ) 


The result follows from equations <|7}, (|9]>, (| 1 ()[> & (11 1 . 


□ 


Graphical Newton: The above theorem immediately 
yields the following optimization algorithm. 


Algorithm 1 Graphical Newton 
1 : Input: initial Sx, tuple (Q, {<£„}, {4}) 

2 : repeat 

3: Compute /, { d v f }, { d 2 ip v } from 

4: Compute the SQP step from (j 6 ji, with A„ = B v f. "iv. 

5: Compute step-length 77 via linesearch on inputs S x- 

6 : Update inputs: Sx <— Sx + tjSSx- 

7: until \\d x f\\ < e 


The run-time of every iteration in Algorithm[T]depends cru¬ 
cially upon the time required to solve |[ 6 }. The run-time 
bounds for solving such KKT systems is taken up later in 
the paper. 

3.3 Extension to equality constraints 

Consider optimization problems, which have equality con¬ 
straints in addition to the structured objective from before, 

S x mil L V~ 51 4( S vU5+(v)) I . 

’ Xn V vEV[g] J ( 12 ) 

S„ <- <p v (Ss+( v )), \/v &V[Q\,5 + {v) 

c(S c ) = 0, 

where c(-) = 0 is an additional equality constraint, which 
depends on the variables C C V[Q\. The Lagrangian for 
this problem is given by, 

£(Sp[S], A) = -C(Sy[g], X v \x) + Aj l c(s c), (13) 


where C is as defined in (13 1 , and Xy\x is the correspond¬ 
ing set of multipliers; the variable A, being the concatena¬ 
tion of A c and all multipliers, \y\x, appearing in (13 1 . 


Theorem [T] can be applied to this problem by treating 
Xj c(Sc) as another cost function in the objective, while 
also including the constraint in the KKT system ([ 6 ]). The 
iteration can then proceed by solving the KKT system with 
An = d v (f + A^c), W, and using a merit function for the 
linesearch procedure; the variables (Sx,A c ) are updated 
accordingly. We omit the proof for the validity of this 
method. 


4 Message Passing 

The classical run-time bound for Cholesky factorization 
( i.e Gaussian Belief Propagation @] lfl8l , cannot be ex¬ 
tended to problems such as because of the appearance 
of linear constraints. Such bounds for structured KKT sys¬ 
tems, do not appear to be known within the sparse linear 
algebra community 0 - 

In this section, we provide a Message Passing algorithm for 
solving such KKT systems, and show that it has a run-time 
bound of O(tw 3 )0 given the tree-decomposition. 


4.1 Hypergraph structured QPs 


For a hypergraph 77,, denote the adjacency and incidence 
matrices by A\H\ & B [77] respectively. 


A[H] G 
A[H\ UV 

B[H} eu 


k W['h]|x|v[w]| ) £[ 77 ] e rI b [«]|x|v[«]| ; 

J1 3e G £[77], u, v G e 
1 0 otherwise 

j 1 u £ e 

I 0 otherwise 


(14) 

Given such a hypergraph 77, the family of QPs we’re inter¬ 
ested in solving is the following. 


min Y ^QeSe - 
x — 2 / 1C \ 

eeE[H\ (15) 

Ve G £[77], G e S e = h e . 


Assuming that the QP has a bounded solution and that the 
constraints are full rank, the minimizer to (15 1 is given by 
the solution to the following KKT system, 


' Q 

G T ' 


X 


' b " 

G 

0 


A 


h 


x,bem. lvl , x 


2 Gaussian-BP, computes the LU decomposition of a matrix 

3 The tilde hides factors linear in | V[77]|, |£[77][. 


















where Q, G , A, x, b are concatenation of terms defined in 
© respectively. The sparsity/support of ( fl6| ) is closely 
related to TL, since, 

supp(Q) C supp(k4[?7]), 

Vi,3e,supp(G i>: ) C supp 

Every row of the constraint, G,_-, has the same sparsity as 
some edge e £ E[TL\. 

Tree decomposition: Extending the notion of Dy¬ 
namic Programming to non-trees (including Hypergraphs) 
requires a partitioning of the graph so as to satisfy a lifted 
notion of being a tree hd. Tree decomposition captures 
the essence of such graph partitions. 

Definition 1 (Tree decomposition) A tree-decomposition 
of ci hypergraph TL consists of a tree T and a map y : 
V[T] -> 2 V ^ H \ such that, 


Algorithm 2 Graphical QP 

1 : Given: T,TL, {Q e }, {b e },{G e }, {h e }. 

2 : 

3: function GatherMessaget/, P,T) 

4: (Qi,bi,Gi,hi) 4- (Qi,bi,Gi,hi) 

5: for c £ 5p(l)\p do 

6- (Qc—tii Gc—th b c —ti, hc—ti') f 

GatherMessage(c,p, T) 

7: (Qi,bi) <— (Qi,bi)_+ {Qc^fijbc^i) 

8: Gi a- [Gp, G c ->i\, hi <— [hf, ftc-w] 

9: end for 

10 : return Factorize(y(f), x(p),Qi,bi,Gi,hi ) 

11 : 

12 : function Factorize(v((), \(p), Q, b, G, h ) 

13: (£, i) A- (x{l)\x(p), X(0 n x(p)) 

14: r A- rank(Q t l ) 

15: return Gaussian-BP messages from ( fTT| . 

16: return 


i (Vertex cover) U ieV [T]X(i) = V[H]. 

ii (Edge cover) Ve £ E[TL], 3* £ V[T], e C y(«). 

iii (Induced sub-tree) \/u £ V[TL], T u — T[{i £ 
V[7~]|m £ x(*)}] a non-empty subtree 

The tree-width of a tree-decomposition T is defined to be 
tw(T) = maXjgym |x(v)| — 1. The tree-width of a graph 
Tl is defined to be the minimal tree-width attained by any 
tree-decomposition ofTL. 

We define the vertex-induced subgraph in what follows to 
be TL[S\ = (V[TL], {e n 5, e £ E[TL]}). The following 
lemma ensures that such a decomposition ensures local de¬ 
pendence CD. 

Lemma 1 (Edge separation) Deleting the edge xy £ 
E[T], renders TL[V\(x( x ) (T y(y))] disconnected. 


and those on the boundary (i.e common to p,l) by f = 
x(0\xO)> an d r = rank(Q t)t ). The function computes 
Gaussian-BP messages from block pivots {2,3} to {1,4} 
in GZK Note that, unlike Gaussian-BP, the matrices in ( fl7| ) 
are not necessarily positive definite, but are however invert¬ 
ible. 


Qii 

q:, 

U :r.t 

Gl, L 


S( 


1 

V/ 

1 _ 

Qii 

Qu 

U :r,L 

G T r , t 


S, 


k 

G,r,i 

G.. rtl 

0 

0 


A • rp 


h :r 

1 - 

Try 

G r:>t 

0 

0 


X r: 


1 

1 _ 


(17) 


Gaussian Belief-Propagation is essentially a restatement of 
LU decomposition (4). Gaussian-BP consists of messages 
of the form ETl fl8ll . 


Hypertree structured QP: The tree-decomposition 
itself can be considered a Hypergraph, (V[TL\, {y(it), Vit £ 
V[T]}). Such a Hypertree ^ can also be thought of as 
a Chordal graph fl8l . We assume henceforth that the 
given graph TL is a hypertree, and that T is its tree- 
decomposition. 

The gather stage of the Message Passing algorithm, is il¬ 
lustrated in Algorithm]^ [^] 

The function. Factorize, computes the partial LU decom¬ 
position of its arguments; we describe below, its operation. 
Denote the vertices that are interior to l by t = \(7) D \{p), 

4 There are multiple definitions of a Hypertree ; we use the term 

to mean a maximal Hypergraph, whose tree-decomposition can be 
expressed in terms of its edges. 

3 Note that the addition is performed vertex label-wise in 
Line 6 of Algorithm[2] 


Pi^rj •— [Ji—¥j , hi—^j ] — [Jiii hf\ ~ JikJhk 

keS(i)\j 


Pi — J^Ahi^j JijPj), 

(18) 

where Jp = h is the equation that is to be solved. These 
can be replaced by appropriate square-root forms to obtain 
instead, an LDL decomposition. 


Theorem 2 The linear equation ( |76| ) can be solved in time 
0(tw(TL) * * 3 ), given the minimal tree-decomposition via Al¬ 
gorithm^ 


Proof The correctness of the algorithm follows from 
Lemma [l] The bound holds trivially if, rank (J < 
rank , at every step of the algorithm. Otherwise, by 
realizing that can’t have rank more than |y(p)|, the 

proof follows. □ 

















It follows from Theorem [2] that the KKT system in Algo- 
rithm[l]can be solved in time 0{tw{Q) 3 ), where Q is the 
moralization of the computational graph Q . 

The above proof also ensures that the equivalent sparse 
LU/LDL decomposition El , with the same pivot order, also 
has the same run-time. Since decompositions of indefinite 
systems are subject to instability, use of specialized solvers 
is generally preferable. 

5 Numerical Experiments 

In this section, we present preliminary numerical results 
with an implementation of Algorithm [1] using the MA57 
solver 0. For ensuring convergence in constrained prob¬ 
lems, an augmented Lagragian merit function was used 
0. The implementation was tested on the following non¬ 
standard control problems. 

Spring-damper limit cycle: Consider the following 
spring-damper limit-cycle problem fl4l . 

min / £(x,x,u)dt, 
xo,u[0,T]J 

x = — (x 3 + x 3 )/6 + u, 

a;(0) = x(T) = xo,x(0) = x(T) = xq, 

where, 

t(x,x,u) = (1 - e (xi ~' 2r _e _(xi+2) ') + \\\ui\\l. 

Discretising the derivatives by finite differences, x « 
Axi/At = (Xi — Xj_i)/At, this can be written as the fol¬ 
lowing structured optimization problem, 

N 

min y^J(xj, Axi/At, m), 

Xi + 1 <- Xi + A Xi + {At) 2 [-(a: 3 + {Axi/At) 3 )/6 + u»] • 
Xq = Xn-2,Xi = Xn-1- 

( 20 ) 

For N = 100, At = 0.1, with random initializations, the 
problem showed robust convergence; often taking no more 
than ten SQP iterations. The optimal limit cycle, and the 
convergence curves for one run of the algorithm are shown 
in (Figure |4~T| ). 

6 Discussion 

We have shown that the Newton step can be computed in 
time 0(tw 3 ), where ’tw’ is the tree-width of the compu¬ 
tational graph. We have also derived extensions to con¬ 
strained problems, and provided numerical examples. The 
technique presented herein, also generalizes many special¬ 
ized algorithms in control. 


In certain control problems, the solution to the KKT sys¬ 
tem, itself can be written in feedback form. Given a LU de¬ 
composition of the KKT system, one can replace the back- 
substitution phase by U, with a function evaluation that 
uses L as a control feedback ED- It is unclear if such tech¬ 
niques can be generalized, and whether they can be made 
independent of the pivot-order used for solving the system. 

A competing method for exploiting the structure of objec¬ 
tives such as 0- is by the use Hessian vector product AD 
routines in conjugation with CG-like methods. Computing 
the Hessian vector product takes time 0{u{Q) 2 ), where Q 
is a moralization of the computational graph 0. By con¬ 
trast, if the computational graph were chordal, then com¬ 
puting the Newton-step via Algorithm[l]is only 0{u>{Q) 3 ). 
The latter is more economical when the cliques of a graph 
are small in comparison to the order of the graph. The ill- 
conditioned nature of structured objectives may also lead 
to bad convergence properties for CG algorithms. 

For problems whose tree-widths are large, the iterative 
method is obviously more viable. However, following the 
rapid advances in approximate inference in the past two 
decades ED , we hope that the explicit algebraic connection 
to graphical models made in this paper, can be exploited in 
coming up with less-agnostic iterative methods. 
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Figure 2 : LEFT: An optimal limit cycle for the system x = —(a; 3 + ± 3 )/6 + it. MIDDLE: Convergence of the objec¬ 
tive function for the limit-cycle problem ( p 0 | ). RIGHT: Convergence in norm, of the Lagrangian gradient, and constraint 
deviation. 














